Leveraging Subjective Human Annotation for Clustering Historic Newspaper Articles

نویسندگان

  • Haimonti Dutta
  • William Chan
  • Deepak Shankargouda
  • Manoj Pooleery
  • Axinia Radeva
  • Kyle Rego
  • Boyi Xie
  • Rebecca J. Passonneau
  • Austin Lee
  • Barbara Taranto
چکیده

Haimonti Dutta, The Center for Computational Learning Systems William Chan, Department of Computer Science Deepak Shankargouda, Department of Computer Science Manoj Pooleery, The Center for Computational Learning Systems Axinia Radeva, The Center for Computational Learning Systems Kyle Rego, Department of Computer Science Boyi Xie, The Center for Computational Learning Systems Rebecca J. Passonneau, The Center for Computational Learning Systems Austin Lee, The Center for Computational Learning Systems Barbara Taranto, New York Public Library

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Parameters of the K-Means Algorithm From Subjective Human Annotation

The New York Public Library is participating in the Chronicling America initiative to develop an online searchable database of historically significant newspaper articles. Microfilm copies of the papers are scanned and high resolution OCR software is run on them. The text from the OCR provides a wealth of data and opinion for researchers and historians. However, the categorization of articles p...

متن کامل

Textual Article Clustering in Newspaper Pages

In the analysis of a newspaper page an important step is the clustering of various text blocks into logical units, i.e., into articles. We propose three algorithms based on text processing techniques to cluster articles in newspaper pages. Based on the complexity of the three algorithms and experiment on actual pages from the Italian newspaper L’Adige, we select one of the algorithms as the pre...

متن کامل

Clustering in Newspaper Pages

In the analysis of a newspaper page an important step is the clustering of various text blocks into logical units, i.e., into articles. We propose three algorithms based on text processing techniques to cluster articles in newspaper pages. Based on the complexity of the three algorithms and experimentation on actual pages from the Italian newspaper L’Adige, we select one of the algorithms as th...

متن کامل

Searching the news Using a rich ontology with time-bound roles to search through annotated newspaper archives

A frequent motivation for annotating documents using ontologies is to allow more efficient search. For collections of newspaper articles, it is often difficult to find specific articles based on keywords or topics alone. This paper describes a system that uses a formalisation of the content of newspaper articles to answer complex queries. The data for this system is created using Relational Con...

متن کامل

Evaluation Set for Slovak News Information Retrieval

This work proposes an information retrieval evaluation set for the Slovak language. A set of 80 queries written in the natural language is given together with the set of relevant documents. The document set contains 3980 newspaper articles sorted into 6 categories. Each document in the result set is manually annotated for relevancy with its corresponding query. The evaluation set is mostly comp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1208.3530  شماره 

صفحات  -

تاریخ انتشار 2012